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Abstract. The neighbor-joining algorithm is a popular phylogenetics method for con- 
structing trees from dissimilarity maps. The neighbor-net algorithm is an extension of 
the neighbor-joining algorithm and is used for constructing split networks. We begin by 
describing the output of neighbor-net in terms of the tessellation of A4q (K) by associa- 
hedra. This highlights the fact that neighbor- net outputs a tree in addition to a circular 
ordering and we explain when the neighbor-net tree is the neighbor-joining tree. A key 
observation is that the tree constructed in existing implementations of neighbor-net is 
not a neighbor-joining tree. Next, we show that neighbor-net is a greedy algorithm for 
finding circular split systems of minimal balanced length. This leads to an interpre- 
tation of neighbor-net as a greedy algorithm for the traveling salesman problem. The 
algorithm is optimal for Kalmanson matrices, from which it follows that neighbor- net 
is consistent and has optimal radius ^. We also provide a statistical interpretation for 
the balanced length for a circular split system as the length based on weighted least 
squares estimates of the splits. We conclude with applications of these results and 
demonstrate the implications of our theorems for a recently published comparison of 
Papuan and Austronesian languages. 



1. Introduction 

The neighbor- net algorithm was introduced by Bryant and Moulton in [TU]. It is a 
method for constructing spht networks [24j from distance measurements, and has been 
used for evolutionary analyses in linguistics and phylogenetics [32]. Neighbor- 
net is gaining in popularity because it is as fast as distance based methods for tree 
construction, and the split networks output by the algorithm are informative for studying 
conflicting signals in data. The interpretations of split networks are based on T-theory 
[21 125] , which is an active research area within mathematics. 

Despite the intuitive appeal of split networks for data analysis, a criticism of their 
use in phylogenetics, and of the neighbor-net algorithm in particular, has been the lack 
of an obvious tree interpretation. Moreover, although it was remarked in [TU] that 
"neighbor-net is based on the neighbor-joining algorithm of Saitou and Nei [13]", this 
was meant to indicate analogy at a high level: neighbor-net and neighbor- joining are 
both agglomerative algorithms, they have similar selection criteria, and they are both 
consistent. However despite the obvious similarities between neighbor-net and neighbor- 
joining, there has been no direct link established between the outputs of the algorithms. 
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It is desirable to establish a mathematically precise connection because there have been 
a number of recent papers "explaining" neighbor-joining [31j, both in terms of showing 
what it optimizes [2T] and why it works well in practice [2S]. The lack of informative 
theorems about neighbor-net coupled with the difficulties in mastering T-theory have 
contributed to a sense that interpretations of neighbor-net results "remain messy and 
subject to a certain degree of subjectivit}|l|." 

We describe the precise connection between neighbor-net and neighbor- joining in Sec- 
tion 2, and in Section 5 we show that our observation can be used to allay concerns that 
neighbor-net provides no direct phylogenetic tree information. Our result also provides 
an interpretation of A^o(^) the space of phylogenetic networks. In Section 3 we show 
that neighbor-net is a greedy algorithm for the traveling salesman problem that mini- 
mizes the balanced length of the split system at every step. This extends the notion of 
balanced length in [45J and the results of [21] where it was shown that neighbor-joining 
greedily optimizes the balanced length of a tree. In Section 4, we prove that neighbor-net 
is optimal for Kalmanson dissimilarity maps. This establishes new proofs for results of 
[m [151 [17], cLnd provides an analog of Atteson's neighbor-joining robustness theorem [2] 
for neighbor- net. 



The main objects of study in this paper are a class of discrete metric spaces called 
circular decomposable metrics that include tree metrics as a special case. We begin with 
an introduction to some fundamental results about these metric spaces. Their study is 
part of T-theory, and we refer the reader to [25] for a more thorough introduction and 
survey of the subject. Throughout the paper, X = {1, . . . ,n} denotes the finite set on 
which metrics are defined. 

Definition 1. A split S = {A, B} is a partition of X into two non-empty blocks. A set 
of splits is called a split system. The split metric determined by S is the pseudo-metric 



Definition 2. A split system S is pair-wise compatible if for every pair of distinct splits 

5*1 = {A, B}, S2 = {A', B'} in S, at least one of the intersections 



Definition 3. A dissimilarity map on X = {1, . . . , n} is a function 5 : X x X — > M that 
satisfies S{i,j) = 6{j, i) > and 6{i, i) = 0. A dissimilarity map 6 satisfies the four point 



The statement appears in the specific context of a commentary on a paper describing tlic classifi- 
cation of Bantu languages [37j ; we believe that it reflects prevailing sentiment about the neighbor-net 
algorithm and its utility for evolutionary analyses. 



2. The mathematics 




if {x, y} A or {x, y} C B. 



1 otherwise. 



Ar]A',AnB',Bn A', BnB' 



is empty. 
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condition if for every four elements i,j, k,l G X, two of the three terms in the following 
list are equal and greater than the third: 

j) + 6ik, /), k) + /), /) + k). 

Theorem 4 ([44j). The following are equivalent statements about 6 : X x X M.: 

(1) There exists a split system S such that every pair of distinct splits in S is pairwise 
compatible, and 6 = Xlses -^s^s where A5 > for all S E S. 

(2) 6 is a metric and satisfies the four point condition. 

There is a canonical median graph associated with a split system called the Buneman 
graph [11]. The Buneman graph of a pairwise compatible split system is a tree, and 
therefore, in light of Theorem HI metrics satisfying the four point condition are called 
tree metrics. They are precisely the metrics 5 : X x X — M for which there is an 
edge weighted tree whose leaves are labeled by X, and for which S{i,j) is the "additive 
distance" between i and j in the tree. 

Theorem H] provides the necessary ingredients for describing the input and output of 
the neighbor- joining algorithm. Specifically, neighbor-joining is an efficient algorithm for 
evaluating a certain function from the set of dissimilarity maps to pairwise compatible 
split systems. A key feature of the algorithm, is that the steps explicitly construct the 
Buneman tree associated with the output. 

The neighbor-net algorithm is similarly explained in terms of certain split systems 
and metrics. The key concept is that of a circular ordering for a finite set X. 

Definition 5. A circular ordering tt = {xi, . . . is a bijection between X and the 
vertices of the ra-cycle C„ such that Xi and Xj+i are adjacent vertices of C„. We adopt 
the convention that Xn+i = Xi. 

Given a circular ordering vr, let = {{{xi,Xj}, {x^^Xi}} : i<j<k<l or l<i< 
j < k}. Note that PV^ is a set consisting of pairs of sets constructed from quartets. In 
what follows we use the notation {ij; kl) to denote the quartet {{xj, Xj}, {x^, x;}}. 

Definition 6. A split system S is circular with respect to a circular ordering vr = 
{xi, . . . ,Xn} if every split 5* G 5 is of the form 

S = {{xi+i, ...,Xj}, {xj+i, Xi}} for some i < j. 

Note that every pairwise compatible split system is circular. 

Definition 7. A dissimilarity map 6 satisfies the Kalmanson conditions [35] with respect 
to a circular ordering vr if for every i < j < k < I, 

6{xi,Xj) + 6{xk,xi) < 6{xi,Xk) + S{xj,xi), 
6{xi,xi) + 6{xj,Xk) < 6{xi,Xk) + S{xj,xi). 

Given a dissimilarity map 6 that satisfies the Kalmanson conditions with respect 
to a circular ordering vr, we let Ws = {{ij]kl) : S{xi,Xj) + S{xk,xi) < 6{xi,Xk) + 
S{xj, xi) for i < j < k < I or I < i < j < k}. Note that Ws C WV is a set of quartets 
given by the strict Kalmanson inequalities. 
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Theorem 8 {\)M CS]). The following are equivalent statements about 6 : X x X ^ M.: 

(1) There exists a circular ordering it and a split system S so that 6 = "^ses ^s^s 
where every split S & S is circular with respect to it and > for all S & S. 

(2) 6 is a metric and satisfies the Kalmanson conditions with respect to vr. 

Moreover, a quartet [ij; kl) G iff there exists a split S with A5 > such that i,j and 
k, I are in different blocks of S. 

Metrics satisfying condition (1) of Theorem [8] are called circular decomposable metrics, 
and it is possible to represent them using split graphs. These are described in detail in 
[To] . Here we merely illustrate the idea with an example (Figure l(a,b)). Each class 
of parallel edges corresponds to one split S E S and the length of the edges in a class 
are given by the A5. Split graphs are not necessarily unique, but they provide a useful 
way to visualize a circular decomposable metric. The neighbor-net algorithm outputs a 
circular ordering for the purpose of visualizing a circular decomposable metric associated 
to it using split graphs. The algorithm is agglomerative, which means that the circular 
ordering is constructed iteratively. The boxed Algorithm 1 describes the details of the 
algorithm. The terms used in its description are defined below: 

Definition 9. Let G be a subgraph of the cycle C„ with n vertices and m components. 
The graph G is called the circular ordering graph. A partial circular ordering C consists 
of the graph G together with a bijection between X and the vertices of G. 

Equivalently, a partial circular ordering is a partition C of X into ordered sets C = 
{Ci, . . . , Cm} where each C X and i,j are adjacent elements in Cr for some r iff i,j 
correspond to adjacent vertices in G. We use the notation Cr to denote the vertices of 
degree or 1 in the subgraph corresponding to Cr. 

Definition 10. Let C be a partial circular ordering with \C\ = m. A weighting for 
C consists of a function /i : X — M such that ^{i) > for all i G X, and for each 

r G {1, . . . , m}, XlieCr' /^(^) ~ ■'■ Z^^^) ^ ^ ^ ^ define 

S{Cr,Cs) := Yl /^W/^(jX^,j),and (1) 

S{x,Cr) := (2) 

Note that if \C\ = \X\ then there is only one weighting for C, i.e., fi{i) = 1 for all 
i. Next, we introduce two types of weightings that lead to interesting neighbor-net 
algorithms in Sections 3 and 5. 

Definition 11. A weighting /i : X — M is a TSP weighting if, for all i G X, fi{i) = 
for all i ^ Cr. 

These weightings lead to aggressive greedy algorithms for the traveling salesman prob- 
lem (Theorem 1251) . 
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Algorithm 1: Neighbor-net algorithm 
Data : A dissimilarity map 5 : A x X — > M. 

Result: Circular ordering tt : A — > C„ together with a spht system T of n — 1 

pairwisc compatible splits that are circular with respect to tt. 
Let G be the disjoint union of n vertices and C the partial circular ordering with 
graph G. Let : A — M be the weighting for C. 
while \C\ > 1 do 

for i,j e (f) do 

Set 

QsiGr, Gs) = {\C\ - 2)5(Cr, Gs) - Ylitec\{Cr} ^i^r-, Q) - J2ctec\{Cs} '^(^*' ^«)- 
end 

[Selection step part 1] Choose a pair Gr*,Gs* e C that minimizes Q^; 
for i e Cr": j e Cs* do 

Set Qs{t,j) = (|C| - 4 + + \CUW,J) - Et^r*,s*^ii^Ct) - 

end 

[Selection step part 2] Choose the pair i* e (7^., j* e Cg* that minimizes Qs] 
[Merge step] Let u^vhe the vertices in the circular ordering graph 
corresponding to i* and j* . Add the edge {u,v) to the circular ordering graph 

and coarsen the partition C by merging Gr* and Gg*- 

[Adjustment step] Adjust fj^{i),i G Gr* U Gg* so that Xligc *uc * /^(O ~ ^■ 
[Tree construction step] Add the split {{Gr* U Gg*}, {Uj^^*, 5*^*4}} to the 
distinguished list, 
end 

Output the circular ordering tt and the split system T. 



Definition 12. Let /i. : A — > M be a weighting for a partial circular ordering C, and 
consider a new weighting : A — > R for the adjustment step of neighbor-net. /i' is a 
tree weighting if it satisfies 

I a//(i) if i e Gr, 

1 (1 — Q;)//(i) if i e Gg, 

where Gr and are the two blocks being merged in the merging step and < a < 1. 

Tree weightings are so named because of the following proposition: 

Proposition 13. The split system S output by neighbor-net on input 6 is pairwise 
compatible, and in bijection with a binary tree T. If /i is a tree weighting then the tree 
T is the neighbor joining tree for 5, where the agglomeration parameter at every step is 
given by the tree weighting parameter a. 

Proof: Note that the addition of an edge to the graph G during a run of the algorithm 
results in a coarsening of the partition C, where two blocks are merged into one. For 
this reason, if = {Ai,Bi} is a split added before 5*2 = {A2,B2} to S, then either 
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ylin742 = or y4ini?2 = 0- To see that the tree determined by T is the neighbor- joining 
tree, it suffices to note that selection step 1, together with the adjustment step specified 
by a tree weighting, is identical to the agglomeration procedure of neighbor-joining. 
With a tree weighting, selection step 2 and the fixed ordering within clusters has no 
effect on the adjustment or tree construction steps. If we simply omit the selection step 
2 and the merge step, the neighbor-net algorithm reduces to neighbor-joining. □ 

Proposition [13] justifies the term tree construction step in the neighbor-net algorithm 
and shows that the output of neighbor-net is not only a circular ordering, but also a 
tree. The connection to the neighbor-joining tree is explored further in Section 5. 

The coarsenings of the partition C in the merge step are also closely related to graph 
tubings [22] : 

Definition 14. Let G be a finite graph. A tube is a proper nonempty set of vertices 
whose induced graph is a proper, connected subgraph of G. A pair of tubes r, s are 
nested if r C s or s C r. They intersect if they are not nested and r fl s 7^ 0, and two 
tubes are adjacent if r fl s = and r U s is a tube. Two tubes are compatible if they 
do not intersect and are not adjacent. A tubing of G is a set of tubes that are pairwise 
compatible. 

Proposition 15. Let Pn-i be the path onn — 1 vertices. A labeling of Pn-i is a bisection 
from {1, . . . , n — 1} to Pn-i- The output of neighbor-net is a labeling of Pn-i together 
with a maximal tubing of its line graph L(P„_i). 

Proof: Each coarsening of C corresponds to a tube in L(P„_i). 

Definition 16 ([12j). For a graph G with n vertices, the graph-associahedron VG is the 
convex polytope of dimension n — 1 whose face poset is isomorphic to the set of valid 
tubings of G, with the poset order corresponding to nesting of tubes. 

The associahedron (denoted by Kn) refers to the graph-associahedron of the path 
P„_i, and its vertices are in bijection with tubings of the path. 

Proposition 17 (See Figure l(c,d)). The number of vertices of Kn-i is given by the 
Catalan number -^^(^^22) ■ vertices are in bijection with tubings of the path Pn-2, 
triangulations of the convex n-gon, and rooted binary trees with n — 1 leaves. 

We have listed just a few of the objects in bijection with the vertices of Kn- In fact, 
there are dozens of combinatorial objects enumerated by the Catalan numbers (see |46j . 
In the context of the neighbor-joining algorithm. Proposition [T7] appears as Proposition 
3.1(ii) in [35]. 

Proposition [T7] allows us to enumerate the total number of possible outputs of the 
neighbor-net algorithm. 

Proposition 18. The number of possible outputs of neighbor-net for n taxa is 

{2n - 5)! 



{n-sy. 
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Figure 1 . (a) A split network representation of a circularly decomposable 
metric. Each split S corresponds to a color class with the length of the 
edges in the class A5 indicating the size of the split, (b) The metric 6 
derived from the splits network, (c) The output of neighbor-net on input 
6. The tree is the neighbor-joining tree. Note that its edges are highlighted 
in the splits network, (d) The associahedron corresponding to the 
circular ordering vr = {1, 4, 3, 5, 2, 6} and the vertex corresponding to the 
neighbor- joining tree, (e) The space of phylogenetic networks A^o(^)- 
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Proof: The number of distinct circular orderings (where two orderings are equivalent 
under the action of the dihedral group) is |(n — 1)! so the total number of possible 
outputs is 

1 /2„-4^ l,,^_^,,_(2n-5^^ ,3, 



n-l\n-2 J 2' ' (n - 3)! 

□ 

The first numbers are 1, 1, 1, 6, 60, 840, 15120, 332640, 8648640, 259459200, . . . These 
numbers also appear in another context in computational biology; in genome assembly 
they are the number of ways that n distinguishable equal-length clones can be interleaved 
to form one island [40] . 

Propositions [15] and [17] together establish that the output of neighbor-net is a circu- 
lar ordering together with the vertex of an associahedron. Equivalently, it is a labeled 
convex n-gon together with a triangulation. Thus, it is natural to consider j{n — 1)\ 
associahedra corresponding to the distinct circular orderings. These associahedra can 
be glued together in a natural way so that faces are identified when the associated 
subdivisions of the n-gon differ by twists along the diagonal [22]. This identification 
corresponds exactly to the tessellation of a certain space known as 7Wq(M) by associa- 
hedra. The space A^g(R) consists of the real points of the Deligne-Knudsen-Mumford 
compactification of the moduli space A4q of Riemannian spheres with n labeled punc- 
tures. Its tessellation by associahedra is described in ^22j. Figure 1(e) shows the example 
for n = 6. One element from the dual tessellation by n — 3 = 3-dimensional cubes is 
also shown. Each cube is divided into 8 octants, and these octants are in bijection with 
the possible outputs of neighbor- net (by Proposition [T8] there are 840 of them). This is 
summarized as follows: 

Remark 19. Neighbor- net is an efficient evaluation of a function from dissimilarity 
maps to octants in the dual tessellation by cubes of A^q(M). The vertices of the cube 
(or equivalently, each associahedron) can be interpreted as providing the basis for circular 
decomposable metrics (networks) together with tubings of the path that are in bijection 
with trees (phylogenies). We therefore refer to A^q(R) (or its dual tiling) as the space 
of phylogenetic network^. 

We note that the relevance of Mq{M.) to phylogenetics was already mentioned in [6], 
however in that paper it was deemed unsuitable for describing the space of trees, and 
replaced with a quotient space equivalent to the tropical Grassmanian [41j- It is inter- 
esting that A^o(^) ^Iso appears in the study of genome rearrangements [5]. It should be 
interesting to explore extensions of neighbor-net that produce, via agglomeration, tub- 
ings of line graphs other than Pn-i, thus leading to more general phylogenetic networks 
connected to graph associahedra. 

We conclude this section by noting that our description of neighbor-net has been 
based on an interpretation of the algorithm as producing only combinatorial output, 
i.e., a circular ordering vr together with a tree. In practice, it is possible to obtain 



The term phylogenetic network is also used to denote other objects, e.g. see 



THE NEIGHBOR-NET ALGORITHM 



9 



weights Xs for the sphts in the circular spht system S compatible with tt in the course 
of the algorithm. This is done by setting 

As = I {Six., X,) + a:,_0 - 6{x., x,.,) - 6{x..„ x,)) . (4) 

for every split 5* = {{xi, xj}, {xj+i, Xi_i}}. 

The problem with such a procedure is that there is no guarantee that all the Xs will 
be non-negative, and therefore the result may not be a circular decomposable metric. 
This may be circumvented by setting Xs to zero if it is negative, but this solution may 
lead to inaccurate results. For these reasons, a preferable procedure is to use the circular 
ordering tt to subsequently estimate the split weights using a non-negative least squares 
optimization method. This was done in the original neighbor-net implementation flU\ . 

3. The computer science 

In the previous section we have explained the input and output of the neighbor- 
net algorithm. In this section, we show that neighbor-net is a greedy algorithm for 
minimizing the (suitably defined) length of a dissimilarity map with respect to a circular 
ordering. We begin by extending the formulation of balanced length in [15] from trees 
to circular decomposable metrics. 

We say that a circular ordering n = {xi, . . . , Xn} is consistent with C, if for every pair 
of adjacent elements i,j in some C/ G C there exists a k such that Xk = i and Xk+i = j- 
We denote the circular orderings consistent with C by o{C). 

Definition 20. The balanced length of a dissimilarity map 6 with respect to a partial 
circular ordering C is defined to be 



' ^ ^' (xi,...,x„)go(C) 



^'^6{xi,Xi+i) 

i=l 



Here ric{i,j) is the number of circular orderings consistent with C where i is adjacent j. 

Remark 21. The partial circular ordering C* = argmin|(^|^]^(Z(5, C)) is just the shortest 
traveling salesman tour for the dissimilarity map S. 

We extend the notion of a balanced agglomeration scheme from neighbor joining to 
neighbor-net: 

Definition 22. A balanced TSP weighting is a TSP weighting where 

/i(i) 



{! 




1 1 


= 2, 




1 G Cr") 


1 1 


= 1. 
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Theorem 23. Let C be a partial circular ordering (\C\ = m) with a balanced TSP 
weighting and 5 a dissimilarity map. A circular ordering C of size \C'\ = m — 1 that 
extends C and minimizes 1{6,C') is obtained by finding Qj paiT Cj-* , Cg* that ttiitiztti'izc 

QsiCr, C,) = (m - 2)6{Cr, C,) - ^i^r, Ct) - J2 '^(^«' 

and then adding an edge between the pair of vertices corresponding to i* G Cr*,j* G Cg* 
in the circular ordering graph that minimize 

t^r* ,s* t^r* ,s* 

fce(c,.uc,,)\W fce(c,.uc,,)\{i} 

Proof: Let C = {Ci, . . . ,Cm} be a partial circular ordering. A neighbor-net step 
consists of adding an edge to C. This constitutes selecting two paths to join (step 1), 
and then deciding which of the ends of the paths to join (step 2). 

Lemma 24. The number of circular orderings consistent with C is 

^ m 

\o{C)\ = -{m-l)\\[\CA. 

r=l 

Let Cr^s denote all of the partial circular orderings where there is an edge between 
endpoints of Cj. and Cs in the circular ordering graph. We say that a circular ordering 
is consistent with if it is consistent with one of the partial circular orderings in 
Cr^s- Similarly, we define o{Cr^s) to constitute all circular orderings consistent with some 
partial circular ordering in Cr^s- In the following lemma we use the notation to denote 
that i and j are in the same block in C G C, and i is adjacent to j. 

Lemma 25. The number of circular orderings consistent with Cr,s is 2\o{C)\/{m — 1) 
and 



'2 


o(C)|/(m-l) 






ifiJc for some C, 


2 


o{C)\^i{i)fi{])/{m- 


1) 




ifi G Cr, j G Cs, 


4 

< 


o{C)\n{i)fi{j)/{m- 


l)(m — 


2) 


if i G Ct, j G Cu, t u, t,u r,s 


2 


o{C)\n{i)n{j)/{m- 


l)(m — 


2) 


ifi G Cr, j eCt,t^ s. 


2 


o{C)\fi{x)fi{y)/{m- 


- l)(m - 


-2) 


ifieCs,] eCt, t^r. 











otherwise. 



The proof of the lemma is elementary. We note that it also makes sense for weightings 
that are not balanced TSP weightings, except that the effect of the weightings /x is to 
alter the rj so that they count the number of circular orderings consistent with split 
systems larger than Cr^s- For example, if /i is a tree weighting, then rj counts the number 
of circular orderings consistent with the partially resolved tree T. For more on this see 
Definition [381 and Theorem [391 
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Figure 2. A step in the neighbor- net algorithm run on the dissimilar- 
ity map 8 from Figure 1(b). (a) A partial circular ordering C, |C| =4 
and the 12 circular orderings consistent with it. Note that at this stage 
C) = (b) Selection step part 1 showing Cr*,s* where Cr* = {6} 
and Cs* = {1,4}. Now l{6,Cr*^s*) = and this is a neighbor-joining 
agglomeration, (c) Selection step part 2 results in a new partial circular 
ordering C, \C'\ = 3| with 6 adjacent to 1 and 1{S,C') = 1||^. This last 
step is what distinguishes neighbor-net from neighbor- joining. 
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We can now conclude the proof of Theorem [ 

C6C ij^ ^ ' Ct^Cu,t,u^r,s 



2(m-2) 

Cac ij^ Ct^Cu 

+i(5(a, c,) - — ^ — - V ^ — - y 5(Ct,c,). 

2 ^ ' ' 2(m-2) ^ V ^; 2(m - 2) ^ v t> 

Thus, 1{D, Cr,s) = 2(rra-2) Q'^(^^' "I" whcrc T does not depend on r or s. In other 
words, at each step neighbor-net is selecting a pair (r*,s*) to join that will minimize 
the balanced length. The actual minimum balanced length is attained for one of the 
|C'r*||C*s*| possibilities for adding an edge between Cr* and Cs* in C. Using the same 
argument as above, it is easy to see that the minimum balanced length is attained when 
Qs{i,j) is subsequently minimized. □ 



Remark 26. Let 



Zs{Cr,Cs) — 6{Cr,Cs) — -QsiCr^Cg). 



Cf=Ca 

Then 



' m-1 
implies that 

/(5,C) - l{5,Cr,s) = Zs{Cr,Cs). 

The quantity Zs{Cr, Cs) features prominently in [161 ISHl [38] and is based on the 
"neighborliness measurement" of [29]: 

Zs{Cr,Cs) = E ^{CrCs : CtCu), whcre 

wiCrCs : = ^{5iCr,Ct)+5{Cr.,Cu)+5{Cs,Ct)+5{Cs,Cu)-25iCr,Cs)-25{Ct,C^)). 

It is interesting to note that the results in [38] are motivated by this alternative 
formulation of the neighbor-joining criterion. Remark [26l provides further evidence that 
the "Z-criterion" is a natural formulation for the neighbor-joining criterion, and at the 
same time explains the meaning of Zs{Cr, Cg) in terms of the balanced length. 

Returning to Remark [2T], we have the following interpretation of Theorem [231 

Remark 27. Neighbor-net with a balanced TSP weighting is a greedy algorithm for the 
traveling salesman problem. 
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In fact, neighbor- net provides the optimal solution for the TSP when 6 satisfies the 
Kalmanson conditions (see Theorem [29] in Section 4). It is well known that the TSP 
can be solved in polynomial time 0{nHogn) for Kalmanson matrices [T7]; neighbor-net 
provides an alternative O(n^) polynomial algorithm. The O(n^) running time is based 
on the observation that the TSP and tree weighting schemes can be implemented so 
that the selection steps are 0{k'^) where k is the number of blocks in the partial circular 
ordering at each step. It should be possible to obtain further improvements in speed by 
using the ideas developed for fast neighbor-joining [27]. 

Theorem [23] is restricted to the balanced TSP weighting. We note, however, that there 
is no practical limitation to using different weightings for the first and second selection 
steps. We may consider a hybrid algorithm that applies a tree weighting to the first 
selection step and a balanced TSP weighting to the second. In that case. Proposition 
[13] together with Theorem [23] show that 

Remark 28. Neighbor- net with a hybrid weighting scheme is a greedy algorithm for 
finding, simultaneously, the tree of minimum balanced length and the circular ordering 
of minimum length consistent with it. 

4. The statistics 

We begin in this section by showing that neighbor-net is a robust algorithm. By this 
we mean that if the input to neighbor-net is a dissimilarity map 5 that is a perturbation 
of a circular decomposable metric with respect to a circular ordering vr, neighbor-net 
outputs the circular ordering vr. We note that in the case of a circular decomposable 
metric where some of the splits have zero weight, there will be more than one circular 
ordering consistent with 5. In that case neighbor-net will output one of those circular 
orderings. A corollary to this is that if 5 is a circular decomposable metric, and equation 
dl]) is used to estimate the distances, then the output is exactly 5, i.e., neighbor-net is a 
consistent estimator of the parameters of a circular decomposable metric. Implicit in the 
neighbor-net estimator are assumptions about the variances of the measured distances. 
These can be interpreted in terms of the weighting scheme used in neighbor- net, and we 
return to this at the end of the section. 

Theorem 29. Suppose that 5 : X ^ X ^ is a dissimilarity map that satisfies the 
Kalmanson conditions for some circular ordering vr. Then neighbor-net applied to 5 
outputs a circular ordering it' such that Wg C PV^/ . 

Proof: It suffices to show that at any step of the algorithm, every circular ordering 
consistent with the partial circular ordering contains all the quartets in Ws- Let C = 
{Ci, . . . ,Cm} be a partial circular ordering consistent with n so that if Xi G Cr and 
Xj G Cs and r < s then i < j. 

Lemma 30. For every r < s < t < u, 
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Proof: This follows directly from the Kalmanson conditions and the requirement that 

Eiec. /^(O = 1 fo^^ every r. □ 
Moreover, if for some a G Cr, b G Cs, x G Ct, y E Cu with fi{a) , fi{b) , fi{x) , fi{y) > we 
have {ab; xy) G W^, then C,) + < ^(a, C^) + 5(C„ CJ- 

Next we introduce some notation to simplify the necessary calculations. We set 
ScrcX^t) = ^iCr,Ct) + 5{Cs,Ct) — 5{Cr,Cs)- This is an analog of the Farris trans- 
form [28] for blocks in the partial circular ordering C. Note that 

Qs{Cr,C,) = -26{Cr,Cs)-J2^CrcACt). (5) 

Ct 

In order to simplify the presentation, we replace every Cj with i in the formulas below. 
This is mathematically justified by Lemma [5U1 since blocks in a partial circular ordering 
behave exactly like elements of the underlying set X with respect to the Kalmanson 
conditions. For example, by Qs{i, i + 1) in the lemma below, we mean Qs{Ci, Cj+i) and 
a proof that Qs{Ci,Ci+2) > Q(5(C'i, Ci+i) is equivalent to the proof that Qs{i,i + 2) > 
Qs{i, + 1) by Lemma [30l 

Lemma 31. 

Qs{i,i + 2) -Qs{i,i + 1) > 0. 
Proof: Let j = i + 2, k = i + 1. 

QsiiJ) - Qsihk) = ^ 6{k,x) + 5{i,j) - 5{i,k) - 5{j,x) 

x^i,j,k 

and 6{k, x) + — 6{i, k) — x) > for each x by Lemma [301 □ 

Lemma 32 (The Anarchy Lemma). 

Q5{i,i + ?,)-Q5{i + l,i + 2) > 0. 
Proof: Let j = i + 3, k = i + l,l = i + 2. Applying Lemma [501 twice: 
Qs{i,j)-Qs{k,l) = {S{i,j)+S{k,x) + 6{l,x))-6{z,x)-6{j,x)-6{k,l) 

Xy^i,j,k,l 

□ 

Lemma 33. Let i<x<y<z<j<t. Then 

Sxy{z) + S^ziy) + Syz{x) + 5^y{t) + 5^z{t) + 5yz{t) 

> 36ij (t) + 6ij (x) + 6ij (y) + Sij{z). 
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Figure 3. An illustration of the proof of Lemma [32l 

Proof: Note that each of the following inequalities follows directly from Lemma [SO] 

2S{x,t) + 26{t,j) > 6{x,i) + 6{x,j) + 6{t,t) + 6{j,t) 
25{y,t) + 25ii,j) > Siy,i) + 5iy,j) + 5ii,t) + 5ij,t) 
25{z,t)+2Sii,j) > 5iz,i) + 5{z,j)+5ii,t) + Sij,t). 



Summing both sides we obtain the required inequality. □ 

Proposition 34. Suppose that i < j — 3. Then there exists k such that 

Q5{i,j)-Q5ik,k + 1)>0. (6) 

Proof: Recall that \C\ = m. Suppose without loss of generality that i = and 
j < m/2. We will find i < k < j — 2 satisfying ([6]), where the proof is non-constructive 
and mimics the arguments in Theorem 25 of [38j. In particular, we show that 

(j-3) Yl iQs{^,j)-Qsix,y))>0, (7) 

0<x,y<j 

SO that there exists i < x,y < j with Qsihj) — Qsix,y) > 0. 
We first note that 

QsiiJ) - Q5ix,y) = ^ Sxyit) - 6ij{t). 

We then break this sum into three sections: those that fall between and j, a matching 
set of the same size (j — 3) that lie beyond j, and lastly, all remaining terms. In this 
way, equation ([7]) equals 



u-3) E 

0<x,y<j 



( j-1 27-3 m-1 \ 



^ 5^y{z) - 5ij{z) + ^ 5^y{t) - 5ij{t) + ^ S^y{s) " 5ij{s 

\ ^=1 t=j+l s=2j-3 I 

By Lemma [221 the last summation is greater than or equal to zero, and so: 
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2j-3 ^ 


> E u 


-3) 








0<x,y<j 


i-i 


z=l 
2i-3 




t=J + l y 


= E 


E 




+ 




0<x,y<j 


2 = 1 

zjtx,y 


t=j+l 






2j-3 












E 




(i/) + ^yz 





t=j+l 0<x,y,z<j, 
xj^y^z 



-35ij{t) - Sij{x) - 5ij{y) - 5ij{z) > 0. 



The final inequality follows from Lemma [331 The claim ([6]) now follows by noting that 
repeated application of the argument leads to one of three cases: either we find a pair 
of neighbors k,k + 1 such that Qs{k, k + 1) < Qs{hj), or else we find a pair that are 
separated by one node (in which case we apply Lemma [3T]) or a pair that are separated 
by two nodes (in which case we apply Lemma 1321) . □ 

Returning to the proof of the theorem, it is clear that if we have a strict Kalmanson 
inequahty on any quartet that separates i and j, then the inequalities in Lemmas EH [32] 
and Proposition [31] are strict inequalities. Consequently we never join a pair of blocks 
that violate a quartet in Ws- If the blocks are of size 1 we are done. Otherwise, it only 
remains to show that two neighboring elements Xr G and Xr+i G Cj+i will be selected 
to be joined in the minimization of Q. This follows directly from the same arguments 
used in Lemmas [31] and [321 □ 

The consistency of neighbor-net now follows easily by observing that for a circular 
decomposable metric, the distances will be correctly inferred using 

Corollary 35 ([9]). Neighbor-net is statistically consistent. 

Moreover, Theorem [29] can be used to obtain a neighbor-net analog of Atteson's 
theorem [2] on the optimal radius of neighbor-joining: 

Corollary 36 (Optimal radius). Let S be a circular split system with respect to a circular 
ordering tc = {xi, . . . A5 > for every S E S, and 63 = J2sgs '^s^s (i circular 

decomposable metric. Ife = minses^s and 6 is any dissimilarity map with \ \S—6s\\oo < f 
then neighbor-net will output a circular ordering whose split system contains S. 

Proof: It suffices to show that if \\S — Ss\\oo < | then 5 satisfies the Kalmanson 
conditions with respect to vr. Let i < j < k < I. 

Ss{xi,Xk) + 5s{xj,xi) - 5sixi,Xj) - 5s{xk,xi) = ^ 2A5. 

S={A,B},i,j£A,k,leB 
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Therefore, 

6{xi,Xk) + S{xj,xi) - 6{xi,Xj) - 5{xk,xi) > \ ^ 2A5 - 2e > 0. 

\S={A,B},i,j&A,k,leB J 

A similar argument shows that 6{xi, Xk) + 5{xj, xi) — S{xj, Xk) — S{xk, Xi) > 0. □ 
Note that in Corollary [361 the dissimilarity map 6 satisfying ||5 — ^^Hoo < f may not 
be a metric. Kalmanson matrices (as opposed to metrics) are characterized in [T8] . 

We have already hinted at connections between neighbor-net and the traveling sales- 
man problem in Section 3. Our next theorem demonstrates the consistency of the TSP 
estimate of the circular ordering and is analogous to Theorem 2 of [20] . 

Theorem 37. Let 6 be a generic circular decomposable metric with respect to a circular 
ordering tt = {xi, . . . ,Xn}- Then l{6,a) > tt) for any circular permutation a = 
{yi,...,yn} different from n. 

Proof: Since 5 is a circular decomposable metric it must satisfy the Kalmanson con- 
ditions. Therefore there must exist i < k, \k — i\ > 1 such that S{yi,yi+i) + 6{yk,yk+i) > 
^iViyVk) + ^iyi+iiVk+i)- Consider the circular ordering 

= {z/i, • • • , Vk, Vk-i, Vi+i, Vk+i, yk+2, Vn}- 
Then 1{S, a') < l{6, a) and therefore argmin^/(5, r) = tt. □ 

This result explains why it makes sense to use TSP solutions directly for finding 
circular orderings [36j. 

We now turn to the statistical meaning of the weighting fi in the neighbor-net algo- 
rithm, and discuss how it should be chosen in practice. We first consider the case of 
tree weightings. In this case neighbor-net outputs a circular ordering consistent with the 
neighbor-joining tree (Proposition [T3l) . The theory of [20] together with our results pro- 
vides a direct interpretation of the agglomeration parameters that can be summarized 
as follows: 

Definition 38 (Length of a split system). Let iS be a split system that is circular with 
respect to some circular ordering and let ris{i,j) be the number of circular orderings 
consistent with S where where x is adjacent to y. The length of a dissimilarity map S 
with respect to S is 

id 

Theorem 39. Let 6 be a dissimilarity map, S a split system that is circular with respect 
to some circular ordering, and r]s{i, j) defined as above. Let 6* = J2ses ^s^s (^s > Oj 
be the circular decomposable metric obtained from the weighted least squares estimates of 
the splits under the assumption that the variance of 6{i,j) is nris{i.,j)~^ (with the same 
constant k, for all i,j). Then 
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The choices of agglomeration parameters for a tree weighting determine ris{i, j) at each 
step and are therefore imphcit variance assumptions on the distances for the weighted 
least squares tree that is being greedily approximated by the algorithm. The balanced 
tree weighting scheme for neighbor-net corresponds to balanced neighbor-joining ag- 
glomeration [20j|. It should be interesting to explore BIONJ [30] analogs for neighbor-net, 
which is easy to do since it only involves adapting the tree weightings. In the case of a 
balanced TSP weighting, Theorem [39] explains that the neighbor-net algorithm ignores 
nodes once they have two neighbors after agglomeration. 

We conclude by remarking that some progress has been made in the development of 
statistical models for split networks, suggesting the possibility for maximum likelihood 
approaches to finding circular spht systems [H HZ] • 



Our goal in this section is to show how the theorems proved in the previous sections 
provide insight into how to use neighbor- net in practice, and in how to infer split net- 
works. We begin with an observation regarding the distance reduction formula used in 
the current implementations of neighbor- net. 

The agglomeration scheme proposed in (TU] is as follows: Suppose that a circular 
ordering contains two blocks Cr, Cs that are being agglomerated, where Cr is a union of 
two smaller blocks Cr = CfUCu so that the agglomerated block is CrUCs = Ci U C„ U Cg 
in that order. 



There is an analogous formula for the case when two blocks, each composed of two blocks 
are being joined (the above formula is applied twice). 

This weighting is neither a TSP weighting nor a tree weighting. Furthermore, in the 
case of agglomeration of a pair of blocks each composed of two blocks, the resulting 
weighting yU depends on the order in which the agglomeration is performed. Thus, the 
tree output by neighbor-net using ([8]) is not necessarily the neighbor-joining tree, whereas 
the use of a tree- weighting scheme guarantees this (Proposition [T3|) . 

The advantage of producing a circular ordering consistent with the neighbor-joining 
tree, is that it allows for a direct analysis of the conflicting signals with a tree of interest. 
To demonstrate this, we analyzed a published dataset of language structure characters 
from Oceanic Austronesian and Papuan languages [26]. The neighbor- net algorithm 
was previously used to infer phylogenetic relationships among the languages (Figure S2 
from the supplementary materials of [26]). We compared Figure S2 obtained using the 
default parameters for neighbor-net ([8]) with the balanced tree weighting scheme that 
produces a neighbor-joining tree. In both cases, the split weights were computed using 
the constrained least squares estimation procedure in [TUj. The split networks were 
visualized using the program SplitsTree4 [33]. Figure 4(a) shows the network for the 
balanced tree weighting scheme, together with the neighbor-joining tree corresponding 
to the split system output by the algorithm. The circular ordering obtained by using the 
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Figure 4. Left: Neighbor-net and the neighbor joining tree for groups 
of Papuan and Austronesian languages. Right: The spht-network inferred 
for the optimal circular ordering that was obtained using Concorde. 



default neighbor-net settings is not consistent with this neighbor- joining tree. The ability 
to view the neighbor-joining tree in conjunction with the neighbor-net split network is a 
direct result of Proposition [T3l The representation of the tree together with the network, 
as shown in Figure 4(left), is useful for directly using neighbor- net to evaluate the extent 
of phylogenetic discordancy with the neighbor-joining tree. For example, we see clearly 
that the split between the Papuan and Austronesian (Oceanic) languages is in fact a 
split in the neighbor-joining tree. Note that all the edges in the network and tree are 
drawn to scale. 

The interpretation of neighbor-net as a greedy algorithm for the TSP suggests an 
analysis of the optimal TSP tour. We computed this tour for the dataset from [26j using 
Concorde [Ij. The optimal tour, of length 7.541 was found in 0.57 seconds. The length 
of this tour should be contrasted with the length of the balanced tree weighting tour, 
7.810, which is very close to 7.794, the length of the tour obtained using the default 
parameters. The constrained least squares optimization procedure of fTU] was applied to 
the optimal circular ordering and resulted in the split network shown in Figure 4(right). 

The comparison of the two split networks in Figure 4 is interesting. A key observation 
in [26] was that the Papuan languages cluster into groups consistent with the geograph- 
ical locations of the islands. On the other hand, it was remarked that Bougainville, 
which is geographically in between the Bismarck Archipelago and the Central Solomon 
Islands, did not cluster in between the languages from those two locations. Figure 
4(right) shows that the TSP ordering produces a better overall clustering, albeit still 
with the Bougainville languages not sandwiched in the geographically correct location. 
Nevertheless, a key new insight that emerges from the network is that Bali, which ap- 
pears to be incorrectly grouped, is in fact correctly grouped if one assumes that the 
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Papuan and Oceanic groups are really two distinct separate groups (it is then just a 
neighbor to Nalik). 

Our main conclusion is that the choice of weightings in the neighbor-net algorithm 
is important in determining the results, and that care has to be taken in choosing the 
weights appropriately. Furthermore, tree weighting algorithms will be useful in cases 
where it is desirable to use neighbor-net as a diagnostic tool for exploring neighbor- 
joining trees, and TSP algorithms may be useful for direct application in obtaining 
circular orderings. In fact, the use of TSP solvers in similar contexts is not new, appear- 
ing in [36j in the context of tree construction and in [S^, where the Concorde program 
is used to find a circular ordering from a distance matrix for proteins based on protein- 
protein interactions. It also seems important to develop a variant of neighbor-net that 
outputs the optimal circular ordering consistent with an arbitrary given tree. 

We conclude by noting that neighbor-net can also be used practically as a greedy 
algorithm for the TSP. Unlike the naive greedy algorithm for which many negative results 
have been published (see, e.g., neighbor-net exhibits good properties. For example, 
the output does not depend on the order of the input, and the algorithm is optimal 
for Kalmanson matrices. We experimented with the problem stTO.tsp from TSPLIB |42j . 
The balanced TSP weighting gave a tour of length 759.801, that is only 12% longer than 
the optimal tour of length 678.598. As expected, the balanced tree weighting scheme 
yielded a longer tour of length 812.613. It will be interesting to explore the improvements 
possible with the incorporation of search heuristics such as nearest neighbor interchange 
moves. These have been used to significantly improve neighbor-joining in the FastME 
program [19J . 
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